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I.  Project  Objectives 


Researchers  in  scientific  computation  recognize  that  to  achieve  the  speeds  necessary 
to  solve  the  new  complex  scientific  and  engineering  problems  of  significant  impact 
to  the  DOD  communtiy,  requires  radical  reorganization  of  traditional  algorithms 
in  matrix  analysis.  It  is  not  sufficient  to  just  implement  old  algorithms  in  a  parallel 
processing  environment.  New  fast  algorithms  for  the  modem  generation  of 
supercomputers  (  such  as  the  Cray  X-MP )  and  mini-supercomputer  systems  (  such  * 
as  the  Alliant  FX/8  )  as  well  as  new  massively  parallel  multiprocessors,  are 
essential.  In  order  to  meet  the  challanges  of  this  emerging  new  generation  of 
machines,  it  is  the  goal  of  this  project  to  develop  techniques  in  matrix  computations 
for  efficient  implementation  on  advanced  architectures.  Significantly,  applications 
of  our  work  to  the  practical  real-world  problems  of  stmctural  optimization  and 
least  squares  estimation  methods  in  signal  processing  are  being  made. 

In  the  area  of  structural  optimization  we  are  concerned  with  the  fundamental 
problem  of  elastic  analysis  -  that  of  finding  the  stresses  and  strains  and  solving 
optimal  redesign  problems,  given  a  finite  element  model  of  a  complex  structure  and 
a  set  of  external  loads.  To  obtain  the  solution  of  this  constrained  minimization 
problem,  a  variety  of  algorithms  involving  the  displacement  method  or  the  force 
method  can  be  applied.  While  the  advantages  of  implementing  one  of  these  methods 
over  the  other  on  serial  computers  have  been  widely  studied,  the  effects  of 
parallelism  in  performing  the  matrix  computations  have  not  received  a  great  deal  of 
attention  until  recently.  Our  work  on  this  topic  thus  far  has  led  to  publications  in 
Numerische  Mathematik,  the  SIAM  Journal  on  Algebraic  and  Discrete 
Methods,  the  SIAM  Journal  on  Scientific  and  Statistical  Computing,  and 
in  Computer  Methods  in  Applied  Mechanics  and  Engineering.  Our  goals 
here  continue  to  be  the  development  and  testing  of  complete  finite  element 
structural  optimization  packages  on  machines  such  as  the  Cray  X-MP  and  the 
Alliant  FX/8,  and  their  comparison  with  traditional  serial  methods  in  packages  such 
as  NASTRAN. 
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Our  main  objective  in  least  squares  computations  this  year  has  been  to 
complete  the  error  analysis  and  testing  of  new  recursive  orthogonal  and  hyperbolic 
rotation  algorithms  for  signal  processing.  This  is  joint  work  with  a  Ph.D.  student, 
C.  T.  Pan,  and  with  S.  T.  Alexander  from  the  NCSU  Department  of  Electrical  and 
Computer  Engineering.  Our  schemes  are  amenable  to  implementation  on  a  variety 
of  vector  and  parallel  processing  systems,  such  as  the  Alliant  FX/8  and  the  Intel 
iPSC  Hypercube.  This  work  in  developing  near  real-time  algorithms  has  produced 
some  especially  significant  recent  results.  The  results  we  are  obtaining  here  may 
very  likely  have  a  significant  impact  with  DOD  researchers  who  are  interested  in 
near  real-time  computations  in,  for  example,  control  and  signal  processing. 

The  goals  of  this  research  are  to  investigate  the  theoretical  aspects  of  the 
computations  as  well  as  to  develop  new  technologies  for  solving  important 
problems  in  an  efficient  and  stable  way  on  modem  high  performance  architectures. 
Here  there  two  areas  of  particular  excitement  in  our  project .  We  are  close  to  the 
establishment  of  a  framework  for  testing  and  comparing  parallel  algorithms  for 
structural  optimization  against  the  more  traditional  approaches  in  commercial  t 
software.  In  addition,  we  are  developing  new  tools  for  least  squares  computations  . 
in  signal  processing  necessary  to  meet  the  challenges  of  solving  near  real-time 
problems  in  a  stable  way  on  die  new  generation  of  multiprocessor  systems. 

Abstracts  of  some  major  findings  obtained  during  the  past  year  of  this  project  are 
provided  in  the  next  section. 
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II.  Abstracts  of  Major  Results 


1.  Computational  Structural  Mechanics  on  High  Performance 
Architectures:  The  major  focus  of  this  work  is  a  detailed  study  of  the 
vectorization  and  parallelization  of  new  and  existing  variations  of  the  Displacement 
and  Force  Methods  in  the  engineering  analysis  of  large-scale  structures. 

Considering  the  increasing  demands  on  the  structural  engineer  to  analyze  larger 
and  more  complex  structures,  the  need  for  vectorization  and/or  multiprocessing  of 
the  numerical  schemes  is  essential.  We  have  used  two  high  performance 
architectures  in  this  work:  an  Alliant  FX/8  and  a  Cray  X-MP  ( made  available  by 
the  NSF  at  the  University  of  Illinois  NCSA ).  Implementation  and  performance 
evaluations  for  a  variety  of  approaches  on  these  architectures  have  indicated  that  an 
element-by-element  preconditioned  conjugate  gradient  scheme  produces  superior 
performance.  Some  of  the  results  of  this  study  were  presented  at  the  First  World 
Congress  on  Computational  Mechanics  at  Austin  ,TX.  ( This  work  includes  a  joint 
project  with  M.  W.  Berry  at  the  Illinois  CSRD. )  * 

2.  Iterative  Methods  for  Equality  Constrained  Least  Squares  Problems: 
We  consider  the  linear  equality  constrained  least  squares  problem  (LSE)  of 
minimizing  the  norm  lie  -  Gxll2  subject  to  the  constraint  Ex  =  p .  A 

preconditioned  conjugate  gradient  method  is  applied  to  the  Kuhn-  Tucker  equations 
associated  with  the  LSE  problem.  This  method  is  compared  to  a  block  SOR  method 
and  is  clearly  superior  to  it.  We  show  that  the  method  is  well  suited  for  structural 
optimization  problems  in  reliability  analysis  and  optimal  design.  Numerical  tests 
on  an  Alliant  FX/8  and  a  Cray  X-MP  using  some  practical  structural  analysis  data 
exhibit  the  efficiency  of  the  method.  Applications  also  have  been  made  to  filtering 
methods  in  signal  processing.  Here  the  scheme  has  the  definite  advantage  that  the 
solution  x  is  easy  to  update  after  a  rank  one  modification  of  the  matrix  G  .  ( This 
is  joint  work  with  J.  Barlow  at  Penn  State  University  and  Nancy  Nichols  at  NCSU. ) 

3.  A  Two-Level  Preconditioned  Conjugate  Gradient  Scheme:  The 
conjugate  gradient  algorithm  is  one  of  the  most  efficient  methods  for  solving  a 
variety  of  problems  arising  in  signals,  systems  and  control,  and  it  has  been 
successfully  implemented  on  various  vector  computers.  In  part  as  an  effort  to 
efficiently  implement  this  algorithm  on  parallel  processors,  a  two-level 
preconditioning  scheme  is  proposed  here  and  tested  on  an  Alliant  FX/8 
multiprocessor  system.  The  scheme  is  based  on  applying  the  SSOR  and  incomplete 
Cholesky  preconditioners  simultaneously  to  a  partitioned  form  of  the  coefficient 
matrix  A.  The  two-level  preconditioner  appears  to  be  especially  well-suited  for 
the  case  where  A  has  a  bordered  block  diagonal  form  commonly  arising  in  domain 


decomposition  or  substructuring  type  problems.  ( This  is  joint  work  with  the  Ph.D. 
student  D.  J.  Pierce  who  is  now  with  Boeing  Computer  Services. ) 

4.  Analysis  of  a  Recursive  Least  Squares  Hyperbolic  Rotation 
Algorithm  for  Signal  Processing:  The  application  of  hyperbolic  plane 
rotations  to  the  least  squares  downdating  problem  arising  in  windowed  recursive 
signal  processing  is  studied.  A  forward  error  analysis  is  given  to  show  that  the 
algorithm  can  be  expected  to  perform  well  in  the  presence  of  rounding  errors, 
provided  the  problem  is  not  too  ill-conditioned.  The  hyperbolic  rotation  algorithm 
is  shown  to  be  forward  ( weakly )  stable  and,  in  fact,  comparable  to  an  orthogonal 
downdating  method  shown  to  be  backward  stable  by  Stewart.  Numerical 
comparisons  are  made  with  Stewart's  method  as  implemented  in  UNPACK.  These 
tests  collaborate  our  error  analysis  which  indicates  that  the  two  methods  should 
result  in  similar  accuracy.  However,  the  hyperbolic  scheme  under  consideration 
requires  n2/2  fewer  multiplications  for  each  downdating  step,  where  n  is  the 
number  of  least  squares  filter  coefficients.  In  addition,  it  is  much  more  amenable  to 
implementation  on  a  variety  of  vector  and  parallel  machines.  ( This  is  joint  work  % 
with  the  Ph.D.  student  C.  T.  Pan  and  with  S.  T.  Alexander  from  the  NCSU 
Department  of  Electrical  and  Computer  Engineering. ) 

5.  Numerical  Properties  of  a  Hyperbolic  Rotation  Method  for 
Windowed  RLS  Filtering:  Numerical  properties  of  the  hyperbolic  rotation 
method  for  windowed  RLS  filtering  are  examined.  This  matrix  oriented  approach 
is  important  from  two  standpoints:  (1)  it  provides  the  LS  predictor  for  a  sliding 
window  block  of  data,  and  (2)  it  is  amenable  to  parallel  implementation.  It  is 
shown  how  a  hyperbolic  rotation  matrix  can  be  constructed  to  update  the  Cholesky 
factor  as  a  function  of  the  previous  Cholesky  factor  and  the  data  in  the  sliding 
window.  ( This  is  also  joint  work  with  the  Ph.D.  student  C.  T.  Pan  and  with  S.  T. 
Alexander  from  the  NCSU  Department  of  Electrical  and  Computer  Engineering. ) 

6.  A  Sharp  Bound  for  Products  of  Hyperbolic  Plane  Rotations.  An 
algorithm  for  downdating  a  least  squares  problem  using  hyperbolic  plane  rotations 
has  recently  been  presented  and  analyzed  by  Alexander,  Pan  and  Plemmons.  Their 
analysis  of  the  numerical  stability  of  the  algorithm  rests  on  the  existence  of  a  tight 
bound  on  the  product  of  norms  of  a  certain  collection  of  hyperbolic  rotations.  The 
main  result  of  this  paper  establishes  the  required  tight  bound  by  use  of  combinatoric 
relationships.  ( This  is  joint  work  between  the  Ph.D.  student  C.  T.  Pan  and  K. 
Sigmon  of  the  University  of  Florida. ) 

7.  Parallel  Algorithms  for  Least  Squares  and  Related  Material:  This 
report  is  concerned  with  the  solution  of  large-scale  least  squares  problems.  Special 
attention  is  placed  on  those  least  squares  problems  arising  in  a  variety  of  scientific 
and  engineering  problems,  including  geodetic  adjustments  and  surveys,  medical 


image  analysis,  molecular  structures,  partial  differential  equations  and 
substructuring  techniques  in  structural  engineering.  In  each  of  these  problems, 
matrices  A  often  arise  which  possess  a  block  angular  structure  which  reflects  the 
local  connection  nature  of  the  underlying  problem.  A  new  direct-iterative  method 
is  proposed  which  corresponds  to  a  new  preconditioner  for  conjugate  gradient  type 
algorithms  involving  the  coefficient  matrix  A  .  This  preconditioner  is  based  on  an 
incomplete  hyperbolic  Cholesky  factorization  of  the  normal  equations,  without 
explicit  formation  of  the  system.  The  preconditioner  is  fully  capable  of  exploiting 
the  block  structure  of  A ,  which  arises  in  the  applications  mentioned  above, 
resulting  in  an  efficient  parallel  implementation.  ( This  is  the  Abstract  of  the  Ph.D. 
dissertation  by  D.  J.  Pierce. ) 

8.  Hyperbolic  Rotations  for  Downdating  the  Cholesky  Factorization 
with  Applications  to  Signal  Processing:  In  many  applications  the  rank  one 
modification ,  i.e.  updating  or  downdating,  the  Cholesky  factorization  of  a  positive 
definite  matrix  A  is  an  important  computation.  There  are  two  standard 
downdating  algorithms:  (1)  the  UNPACK  algorithm,  which  is  based  on  orthogonal^ 
Givens  rotations,  and  (2)  the  hyperbolic  rotation  algorithm,  which  is  similar  to  the  . 
use  of  Givens  rotations  except  that  hyperbolic  functions  are  used.  The  LINPACK 
method  is  known  to  be  backward  stable  while  the  hyperbolic  algorithm  is  faster. 

This  dissertation  presents  a  complete  forward  error  analysis  of  the  hyperbolic 
rotation  algorithm  and  shows  that  it  is  forward  (or  weakly  )  stable.  A  new 
algorithm  for  downdating  Cholesky  factorizations  is  proposed  and  tested.  This  new 
method  is  faster  than  the  hyperbolic  algorithm  and  is  expected  to  be  as  stable  as  the 
LINPACK  method.  Applications  of  downdating  schemes  to  recursive  least  squares 
filtering  methods  in  signal  processing  are  also  developed.  ( This  is  the  Abstract  of 
the  Ph.D.  dissertation  by  C.  T.  Pan. ) 


III.  Abstracts  of  research  in  Progress 


1.  A  New  Look  at  the  Force  Optimization  Method  in  Structural 
Mechanics:  The  Force  Optimization  Method  of  structural  analysis  was  overtaken 
by  the  Matrix  Displacement  Method  in  the  mid-sixties,  and  has  disappeared  from 
the  scene  except  for  a  few  specialized  applications,  as  reported  by  several  authors. 
The  great  majority  of  general  purpose  finite  element  programs  are  now  based  on 
the  Displacement  Stiffness  method.  The  key  weakness  in  the  past  for  the  Force 
Optimization  Method  on  the  computer  has  been  the  difficulty  of  automating  matrix 
optimization  schemes  in  an  efficient  and  stable  manner.  However,  recent 
developments  in  the  area  of  fast  algorithms  for  local  and  global  optimization  by, 
e.g.,  R.  Byrd  and  B.  Schnabel,  have  spurred  a  revival  of  interest  in  the  Force 
Method.  Quoting  from  R.  Galligher,  "Innovation  in  the  Stiffness  Method  is 
reaching  a  plateau  of  diminishing  return.  It  is  prudent  to  revisit  the  Force  Method 
and  tap  its  usefulness".  With  this  in  mind,  we  are  beginning  a  study  of  fresh  new 
approaches  to  the  optimization  based  Force  Method  on  modem  multiprocessor 
systems.  We  are  attempting  to  automate  the  Force  Method  for  implementation  on 
parallel  processors  such  as  the  Alliant  FX/8  and  the  Sequent  Balance.  In  this  regard, 
element-by-element  preconditioning  schemes  are  leading  to  some  exciting  new 
interest  in  this  approach  to  computational  structural  mechanics.  ( This  is  joint  work 
with  R.  E.  White. ) 

2.  Parallel  Factorization  Schemes  for  Minimizing  a  Sum  of  Euclidean 
Norms:  The  problem  of  minimizing  a  weighted  sum  of  Euclidean  norms  is  being 
considered.  Applications  include  minimal  surface  computations.  A  robust  parallel 
algorithm,  based  on  the  line-search  Newton's  Method,  is  being  developed  which 
takes  advantage  of  the  structure  of  the  problem  in  order  to  fully  utilize 
vectorization  and  concurrency  in  the  computations.  The  proposed  method  can 
achieve  high  performance,  especially  on  a  machine  with  an  architecture  that 
combines  vector  and  parallel  capabilities  on  a  two-level  shared  memory  structure, 
such  as  that  presenton  the  Alliant  FX/8  system.  ( This  represents  joint  work  with  S. 
J.  Wright. ) 

3.  Parallel  Rank  One  Matrix  Modifications  on  Distributed  Memory 
Architectures:  Here  we  are  developing  and  testing  parallel  least  squares 
updating  and  downdating  schemes  on  a  64  node  Intel  Hypercube.  The  purpose  is  to 
design  near  real-time  algorithms  for  signal  processing  applications.  The  results 
thus  far  have  been  very  encouraging.  One  of  the  students  who  will  be  involved  with 
this  project  next  year  has  spent  the  Summer  at  Oak  Ridge  developing  code  on  their 
iPSC64  system.  ( This  is  joint  work  with  the  student  C.  Henkel,  who  is  majoring  in 


Nuclear  Engineering  at  NCSU,  and  M.  T.  Heath  at  the  Oak  Ridge  National 
Laboratory.  ) 


4.  Fast  Algorithms  for  Updating  Least  Squares  Computations:  The 
typical  parallel  bottleneck  in  updating  and  downdating  least  squares  computations 
involving  an  observation  matrix  X  is  the  solution  to  triangular  systems  of 
equations  associated  with  the  Cholesky  factor  R  for  X  .  ( Triangular  solvers  are 
inherently  serial,  although  some  recent  progress  has  been  made  in  parallel  schemes 
for  distributed  memory  systems. )  Our  key  results  in  this  project  involve  the 
development  of  new  parallel  algorithms  for  updating  or  downdating  the  inverse 
matrix  R*l,  thus  avoiding  triangular  solvers  altogether.  The  results  we  are 
obtaining  here  mav  very  well  have  a  significant  impact  with  POD  researchers 
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SIAM  Conf.  on  Lin.  Alg.  in  Signals  Systems  and  Control,  Ed.  by  B.  Datta  and  R.  J. 
Plemmons,  (1987),  to  appear.  ( with  D.  J.  Pierce  ) 

10.  An  efficient  parallel  scheme  for  minimizing  a  sum  of  Euclidean 
norms,  submitted  to  the  SIAM  J.  On  Scientific  and  Statistical  Computing. 

(1987),  ( with S.J. Wright) 


VI.  Other  Activities 


1.  Invited  Conference  Lectures: 

(a) .  First  World  Congress  on  Computational  Mechanics,  Austin,  TX  (1986).  ( 
with  M.  Berry ) 

(b) .  ICIAM'87  Minisymposium  on  Linear  Algebra  in  Systems  and  Control,  Paris, 
France  (1987). 

2.  Colloquium  Lectures: 

(a) .  Argonne  National  Laboratory,  Argonne,  IL  (1986). 

(b) .  Air  Force  Office  of  Scientific  Research,  Bolling  Air  Force  Base,  DC  (1987). 

(c) .  INRIA,  Rennes  University,  Rennes,  France  (1987). 


3.  Conference  Organizing  Committees: 

(a).  SIAM  Conference  on  Lin.  Alg.  in  Signals,  Systems  and  Control,  Boston,  MA 
(1987). 


(b) .  First  World  Congress  on  Computational  Mechanics,  Austin,  TX  (1987). 

(c) .  ICIAM'87  Minisymposium  on  Linear  Algebra  in  Systems  and  Control,  Paris 
France,  (1987). 

(d) .  Third  SIAM  Conference  on  Applied  Linear  Algebra,  Madison,  WI,  to  be  held 
(1988). 

4.  Editorial  and  Other  Activities: 

(a) .  Elected  Member,  SIAM  Council. 

(b) .  Advisory  Editor,  Linear  Algebra  and  Applications. 

(c) .  Editorial  Board,  SIAM  J._on  Algebraic  and  Discrete  Methods. 

(d) .  Associate  Managing  Editor,  SIAM  J.  on  Matrix  Analysis  and  Applications. 
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